Geothermal machine learning analysis: Southwest New Mexico

This notebook is a part of the GTcloud.jl: GeoThermal Cloud for Machine Learning.

geothermalcloud

Machine learning analyses are performed using the SmartTensors machine learning framework.

SmartTensors

This notebook demonstrates how the NMFk module of SmartTensors can be applied to perform unsupervised geothermal machine-learning analyses.

nmfk

More information how the ML results are interpreted to provide geothermal insights is discussed in our research paper.

Import required libraries for this work

If NMFk is not installed, first execute in the Julia REPL import Pkg; Pkg.add("NMFk"); Pkg.add("DelimitedFiles"); Pkg.add("JLD"); Pkg.add("Gadfly"); Pkg.add("Cairo"); Pkg.add("Fontconfig"); Pkg.add("Mads").

Load and pre-process the data

Setup the working directory containing the SWNM data

Load the data file

Define names of the data attributes (matrix columns)

Short attribute names are used for coding.

Long attribute names are used for plotting and visualization.

Define attributes to remove from analysis

Define attributes for analysis

Define names of the data locations

Short location names are used for coding.

Long location names are used for plotting and visualization.

Define location coordinates

Set up directories tp store results and figures

Define a range for number of signatures to be explored

Define and normalize the data matrix

Perform ML analyses

The NMFk algorithm factorizes the normalized data matrix Xu into W and H matrices. For more information, check out the NMFk website

Here, the NMFk results are loaded from a prior ML runs.

As seen from the output above, the NMFk analyses identified that the optimal number of geothermal signatures in the dataset 8.

Solutions with a number of signatures less than 8 are underfitting.

Solutions with a number of signatures greater than 8 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2, 3, 4, 5 and 8 signatures.

Post-process NMFk results

Number of signatures

Plot representing solution quality (fit) and silhouette width (robustness) for different number of sigantures k:

The plot above also demonstrates that the accceptable solutions contain 2, 3, 4, 5 and 8 signatures.

Analysis of all the accceptable solutions

The ML solutions containing an acceptable number of signatures are further analyzed as follows:

Analysis of the 5-signature solution

The results for a solution with 5 signatures presented above will be further discussed here.

The geothermal attributes are clustered into 5 groups:

This grouping is based on analyses of the attribute matrix W:

attributes-3-labeled-sorted

The well locations are also clustered into 5 groups:

This grouping is based on analyses of the location matrix H:

locations-4-labeled-sorted

The map ../figures-case01/locations-5-map.html provides interacive visualization of the extracted location groups (the html file can be also openned with any browswer).

Comparison of the ML solutions against the SWNM physiographic provinces

Spatial association of the extracted signatures with the four physiographic provinces in SWNM is summarized here:

signatures

Clearly, the ML algorithm was able to blindly indentify the physiographic provinces associated with analyzed hydrogeothermal systems without providing any information about their location (coordinates).